Final Project

Group 1:

  • Hiba Awan

  • Nathania Stephens

Abstract

Introduction & Background

Motivation/ Purpose

In 2023, there were over 30,000 arrests and close to 65,000 citations in Fairfax County. The Fairfax County boundaries, include areas such as Centreville, Chantilly, Herndon, Reston, Tysons Corner, McLean, Merrifield, George Mason, Annadale, Burke, Springfield, Alexandria, Lorton to name a few. If you live, work, or study in these areas then this project should be of interest to you. This project aims to inform Fairfax County patrons of crime information and hopefully provide some statistical insights that could be applicable.

Goals/ Objectives

In order to provide relevant and insightful crime information, several different visualization methods were applied to help easily interpret and compare data. Statistical learning techniques were utilized to help understand statistical significantly factors and associations between variables. Since the data utilized for this project is largely categorical the project focuses on techniques such as Chi-Squared Test, Logistic Regression, Decision Trees and Random Forest.

Data

Overview

About the Data

Three datasets were pulled from the Fairfax County Police Department website. They covered arrest, citations, and warnings in the year 2023. For simplicity general definition are provided:

Arrest - When a person is taken into custody to answer for an offense or when there is a deprivation or restraint of a person’s liberty in any significant way.

Citation - Formal notice issued by law enforcement officer for a violation of law, typically related to traffic laws or other minor offenses. Typically requiring a violator to appear in court or pay a fine.

Warning - When a violation, typically minor, has been made but an officer issues a warning rather than a citation.

The following attributes were key to the research conducted:

Column Name Data Type Description
Date Date Date of Violation
Time Chr Time of Violation
Offense Chr Description of Violation
Gender Chr Gender of Violator
Ethnicity Chr Hispanic or Non-Hispanic
District Chr Administrative area
Latitude Dbl Coordinates measuring north/ south of equator
Longitude Dbl Coordinates measuring east/ west of prime meridian
Outcome Chr Result of violation, arrest, citation, or warning

Limitations and Assumptions

Due to the nature of the data available on the Fairfax County Police Department website, analysis was limited to qualitative techniques. The approach taken for the project focused on predicting through qualitative responses or classification. This means that each record pulled from the Fairfax County Police Department (FCPD) would be assigned to a category or class.

While understanding local crime is the goal of this project, the data acquired only accounts for crime that was recorded by FCPD. It does not take into account crimes that were not report or any other crime that may have been reported through FCPD.

Cleaning and Transformation

Research Questions

  1. Is there an association between gender and warnings?

  2. Are there other factors that determine if someone gets out of a “ticket”? OR Are you more likely to get a ticket at the end of the month (some believe that police officers have a monthly quota)

Exploratory Analysis

To address each of these question, first exploratory analysis should be done to gain an understanding and summary of the crime metrics for Fairfax County. This includes understanding what type of crimes occurred the most and where.

General crime Mapping the arrest data for a geospatial visual of where arrest occur.l

Next we look at the Top 10 Arrest Type by Incident Based Reporting (IBR) codes.

Next examining the Top 10 Citations

Warning Versus Citation Next an examination of warning versus citation will be observed… This will help understand what different factors could play into getting a warning or a citation.

# A tibble: 11 × 3
   District         n Proportion
   <chr>        <int>      <dbl>
 1 Sully        18612  0.208    
 2 Springfield  12581  0.140    
 3 Braddock     10292  0.115    
 4 Franconia    10033  0.112    
 5 Hunter Mill   8718  0.0972   
 6 Mason         8168  0.0911   
 7 Dranesville   7143  0.0797   
 8 Providence    6713  0.0749   
 9 Mount Vernon  6113  0.0682   
10 Unverified    1281  0.0143   
11 <NA>             1  0.0000112

        BinaryOutcome
Gender       0     1
  Female 20478  8777
  Male   43657 15408

    Pearson's Chi-squared test with Yates' continuity correction

data:  contingency_tbl
X-squared = 150.62, df = 1, p-value < 2.2e-16

Conclusion

References